.. _quickstart:

==========
Quickstart
==========

This guide will walk you through:

- Defining a task in a simple YAML format
- Provisioning a cluster and running a task
- Using the core SkyPilot CLI commands

Be sure to complete the :ref:`installation instructions <installation>` first before continuing with this guide.

Hello, SkyPilot!
------------------

Let's define our very first task, a simple Hello, SkyPilot! program.

Create a directory from anywhere on your machine:

.. code-block:: console

  $ mkdir hello-sky
  $ cd hello-sky

Copy the following YAML into a ``hello_sky.yaml`` file:

.. code-block:: yaml

  resources:
    # Optional; if left out, automatically pick the cheapest cloud.
    cloud: aws
    # 1x NVIDIA V100 GPU
    accelerators: V100:1

  # Working directory (optional) containing the project codebase.
  # Its contents are synced to ~/sky_workdir/ on the cluster.
  workdir: .

  # Typical use: pip install -r requirements.txt
  # Invoked under the workdir (i.e., can use its files).
  setup: |
    echo "Running setup."

  # Typical use: make use of resources, such as running training.
  # Invoked under the workdir (i.e., can use its files).
  run: |
    echo "Hello, SkyPilot!"
    conda env list

This defines a task with the following components:

- :code:`resources`: cloud resources the task must be run on (e.g., accelerators, instance type, etc.)
- :code:`workdir`: the working directory containing project code that will be synced to the provisioned instance(s)
- :code:`setup`: commands that must be run before the task is executed (invoked under workdir)
- :code:`run`: commands that run the actual task (invoked under workdir)

All these fields are optional.

To launch a cluster and run a task, use :code:`sky launch`:

.. code-block:: console

  $ sky launch -c mycluster hello_sky.yaml

.. tip::

  This may take a few minutes for the first run.  Feel free to read ahead on this guide.

.. tip::

  You can use the ``-c`` flag to give the cluster an easy-to-remember name. If not specified, a name is autogenerated.

The ``sky launch`` command performs much heavy-lifting:

- selects an appropriate cloud and VM based on the specified resource constraints;
- provisions (or reuses) a cluster on that cloud;
- syncs up the :code:`workdir`;
- executes the :code:`setup` commands; and
- executes the :code:`run` commands.

In a few minutes, the cluster will finish provisioning and the task will be executed.
The outputs will show ``Hello, SkyPilot!`` and the list of installed Conda environments.

Execute a task on an existing cluster
=====================================

Once you have an existing cluster, use :code:`sky exec` to execute a task on it:

.. code-block:: console

  $ sky exec mycluster hello_sky.yaml

The ``sky exec`` command is more lightweight; it

- syncs up the :code:`workdir` (so that the task may use updated code); and
- executes the :code:`run` commands.

Provisioning and ``setup`` commands are skipped.

Bash commands are also supported, such as:

.. code-block:: console

  $ sky exec mycluster python train_cpu.py
  $ sky exec mycluster --gpus=V100:1 python train_gpu.py

For interactive/monitoring commands, such as ``htop`` or ``gpustat -i``, use ``ssh`` instead (see below) to avoid job submission overheads.


View all clusters
=================

Use :code:`sky status` to see all clusters (across regions and clouds) in a single table:

.. code-block:: console

  $ sky status

This may show multiple clusters, if you have created several:

.. code-block::

  NAME       LAUNCHED     RESOURCES             COMMAND                            STATUS
  mygcp      1 day ago    1x GCP(n1-highmem-8)  sky launch -c mygcp --cloud gcp    STOPPED
  mycluster  4 mins ago   1x AWS(p3.2xlarge)    sky exec mycluster hello_sky.yaml  UP

See here for a list of all possible :ref:`cluster states <sky-status>`.

.. _ssh:

SSH into clusters
=================
Simply run :code:`ssh <cluster_name>` to log into a cluster:

.. code-block:: console

  $ ssh mycluster

:ref:`Multi-node clusters <dist-jobs>` work too:

.. code-block:: console

  # Assuming 3 nodes.

  # Head node.
  $ ssh mycluster

  # Worker nodes.
  $ ssh mycluster-worker1
  $ ssh mycluster-worker2

The above are achieved by adding appropriate entries to ``~/.ssh/config``.

Because SkyPilot exposes SSH access to clusters, this means clusters can be easily used inside
tools such as `Visual Studio Code Remote <https://code.visualstudio.com/docs/remote/remote-overview>`_.

Transfer files
===============

After a task's execution,  use :code:`rsync` or :code:`scp` to download files (e.g., checkpoints):

.. code-block:: console

    $ rsync -Pavz mycluster:/remote/source /local/dest  # copy from remote VM

For uploading files to the cluster, see :ref:`Syncing Code and Artifacts <sync-code-artifacts>`.

Stop/terminate a cluster
=========================

When you are done, stop the cluster with :code:`sky stop`:

.. code-block:: console

  $ sky stop mycluster

To terminate a cluster instead, run :code:`sky down`:

.. code-block:: console

  $ sky down mycluster

.. note::

    Stopping a cluster does not lose data on the attached disks (billing for the
    instances will stop while the disks will still be charged).  Those disks
    will be reattached when restarting the cluster.

    Terminating a cluster will delete all associated resources (all billing
    stops), and any data on the attached disks will be lost.  Terminated
    clusters cannot be restarted.

Find more commands that manage the lifecycle of clusters in the :ref:`CLI reference <cli>`.

Scaling out
=========================

So far, we have used SkyPilot's CLI to submit work to and interact with a single cluster.
When you are ready to scale out (e.g., run 10s or 100s of jobs), SkyPilot supports two options:

- Queue many jobs on your cluster(s) with ``sky exec`` (see :ref:`Job Queue <job-queue>`);
- Use :ref:`Managed Spot Jobs <spot-jobs>` to run on auto-managed spot instances
  (users need not interact with the underlying clusters)

Managed spot jobs run on much cheaper spot instances, with automatic preemption recovery. Try it out with:

.. code-block:: console

  $ sky spot launch hello_sky.yaml

Next steps
-----------

Congratulations!  In this quickstart, you have launched a cluster, run a task, and interacted with SkyPilot's CLI.

Next steps:

- Adapt :ref:`Tutorial: DNN Training <dnn-training>` to start running your own project on SkyPilot!
- See the :ref:`Task YAML reference <yaml-spec>`, :ref:`CLI reference <cli>`, and `more examples <https://github.com/skypilot-org/skypilot/tree/master/examples>`_
- To learn more, try out `SkyPilot Tutorials <https://github.com/skypilot-org/skypilot-tutorial>`_ in Jupyter notebooks

We invite you to explore SkyPilot's unique features in the rest of the documentation.
